Sentence Level Information Patterns for Novelty Detection

نویسنده

  • XIAOYAN LI
چکیده

SENTENCE LEVEL INFORMATION PATTERNS FOR NOVELTY DETECTION JULY 2006 XIAOYAN LI, B.E. TSINGHUA UNIVERSITY M.E., TSINGHUA UNIVERSITY Ph.D. UNIVERSITY OF MASSACHUSETTS AT AMHERST Directed by: Professor W. Bruce Croft The detection of new information in a document stream is an important component of many potential applications. In this thesis, a new novelty detection approach based on the identification of sentence level information patterns is proposed. Given a user’s information need, some information patterns in sentences such as combinations of query words, sentence lengths, named entities and phrases, and other sentence patterns, may contain more important and relevant information than single words. The work of the thesis includes three parts. First, we redefine “what is novelty detection” in the lights of the proposed information patterns. Examples of several different types of information patterns are given corresponding to different types of uses’ information need. Second, we analyze why the proposed information pattern concept has a significant impact in novelty detection. A thorough analysis of sentence level information patterns is elaborated on data from the TREC novelty tracks, including sentence lengths, named entities (NEs), and sentence level opinion patterns. Finally, we present how we perform novelty detection based on information patterns, which focuses on the identification of previously unseen query-related patterns in sentences. A unified pattern-based approach is presented to novelty detection for both specific NE topics and more general topics. Experiments on novelty detection were carried out on data from the TREC 2002, 2003 and 2004 novelty tracks. Experimental results show that the proposed approach significantly improves the performance of novelty detection for both specific and general topics, therefore the overall performance for all topics, in terms of precision at top ranks. Future research directions are suggested.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An information-pattern-based approach to novelty detection

In this paper, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, ‘‘novelty’’ is redefined based on the proposed information patterns, and several different types of information patterns are given corresponding to different types of users’ information needs. Second, a thorough analysis of sentence level information patterns is...

متن کامل

Document-to-Sentence Level Technique for Novelty Detection

Novelty identification is accustomed to distinguishing novel data from an approaching stream of documents. In this study, we proposed a novel methodology for document-level novelty identification by utilizing document-to-sentence-level strategy. This work first splits a document into sentences, decides the novelty of every sentence, then registers the record-level novelty score in view of an al...

متن کامل

Graph-Based Text Representation For Novelty Detection

We discuss several feature sets for novelty detection at the sentence level, using the data and procedure established in task 2 of the TREC 2004 novelty track. In particular, we investigate feature sets derived from graph representations of sentences and sets of sentences. We show that a highly connected graph produced by using sentence-level term distances and pointwise mutual information can ...

متن کامل

Novelty Detection via Answer Updating

The detection of new and novel information in a document stream is an important component of potential applications. This paper describes an answer updating approach to novelty detection at the sentence level. Specifically, we explore the use of questionanswering techniques for novelty detection. New information is defined as new/previously unseen answers to questions representing a user’s info...

متن کامل

Exploring fact-focused relevance and novelty detection

Purpose – Automated sentence-level relevance and novelty detection would be of direct benefit to many information retrieval systems. However, the low level of agreement between human judges performing the task is an issue of concern. In previous approaches, annotators were asked to identify sentences in a document set that are relevant to a given topic, and then to eliminate sentences that do n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006